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DETAILED ACTION 



Continued Examination Under 37 CFR 1.114 

A request for continued examination under 37 CFR 1.114, including tine fee set fortli in 
37 CFR 1 .17(e), was filed in this application after final rejection. Since this application is 
eligible for continued examination under 37 CFR 1 .1 14, and the fee set forth in 37 CFR 
1 .17(e) has been timely paid, the finality of the previous Office action has been 
withdrawn pursuant to 37 CFR 1 .1 14. Applicant's submission filed on 10/29/2009 has 
been entered. 



Response to Arguments 

Applicant's arguments pertaining to the prior art rejections have been fully 
considered but they are not persuasive. Firstly, Applicant states that "Applicant did not 
previously make a "teaches away" argument". However, in the second paragraph on 
page 19 of Applicant's arguments received by the office on 03/31/2009, Applicant stated 
"In fact, Heinzmann expressly teaches awav from claim 1 , by expressly disclosing that 
perspective transformation should not be used, because affine transformations are 
allegedly superior in terms of simplicity of calculations, decreased ambiguity, and 
speed" (emphasis added). This statement by applicant is what prompted the office to 
respond by indicating the impropriety of such arguments in Final Office action dated 
07/01/2009. 
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Applicant's amendment overcomes tine 35 USC 102 rejections using tine 
Heinzmann reference. However, tine 35 USC 103 rejections are maintained. Applicant 
asserts that because the teachings of Heinzmann incorporate the use of affine 
transformations, they cannot be combined with perspective transformations. Applicant 
also states that such a combination would render Heinzmann unsatisfactory for its 
intended purpose. These statements are unsubstantiated and merely conclusory in 
nature. There is no reasoning or explanation provided to support these statements. That 
is, what prohibits the teachings of the Heinzmann reference from being combined with 
perspective transformations, what is the intended purpose of Heinzmann, and why 
would the combination render Heinzmann unsatisfactory for that intended purpose? 

One of ordinary skill in the art would be quite well aware that perspective 
transformation can be combined with the teachings of the Heinzmann reference. Affine 
transformations allow for translations, rotations, scaling, and skewing where parallel 
lines will remain parallel after the transformation. Perspective transformations also allow 
for translations, rotations, scaling, and skewing, but have an additional degree of 
freedom such that parallel lines need not remain parallel after the transformation. 
Indeed, an affine transformation is merely a subset of perspective transformations with 
a constraint such that parallel lines must remain parallel after the transformation. 
Therefore, there is no reason why a perspective transformation cannot be used in 
applying estimates of the coordinates of the predetermined feature point instead of an 
affine transformation. 
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The intended purpose of tine Heinzmann reference (for tine instant claim 
limitations) is gaze-point estimation. There is no reason why the use of a perspective 
transformation (instead of an affine transformation) would cause and unsatisfactory 
gaze-point estimation. Note that the perspective transformation has an additional 
degree of freedom, such that nothing is lost by using the perspective transformation 
instead of the affine transformation. A perspective transformation matrix can be used to 
achieve the exact same result as an affine transformation. Note that a perspective 
transformation can be used to achieve only a single one or any combination of 
translations, rotations, scaling, skewing, and causing parallel lines to no longer be 
parallel. Also note that the Park reference has the same intended purpose of the 
Heinzmann reference and uses perspective transformations. 

Applicant also argues that Heinzmann notes that "the required calculations [of 
perspective transformations] are complex and time consuming." However, Applicant 
takes this excerpt out of context. Heinzmann's discussion of the perspective 
transformation in this regard is with respect to pose estimation. Indeed, the paragraph 
that this excerpt is taken from (Page 144, col 1 , para 5) begins: "Two different 
transformations may be used for pose estimation from monocular data: perspective or 
affine transformation. The perspective transformation precisely models the actual 
projection of a 3-D scene to the image plane. However, the required calculations are 
complex and time consuming and can deliver up to a fourfold ambiguity in the estimate 
of the pose " (emphasis added). Therefore, Heinzmann states that perspective 
transformations are complex for pose estimation, but is silent with regard to pose 
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estimation for eye-gaze vector estimation (as is applied to the instant claim limitations). 
One of ordinary skill in the art would readily see that the transformation pertaining to an 
entire facial pose would be more computationally intensive and complex than that of an 
eye-gaze estimation. The facial pose estimation involves the consideration of numerous 
parameters while the eye-gaze estimation involves the consideration of a single gaze 
point . 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth In section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claim 1-10 and 23-29 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over J. Heinzmann and A. Zelinsky, "3-D facial pose and gaze point estimation using a 
robust real-time tracking paradigm," IEEE Int. Workshop on Automatic Face and 
Gesture Recognition, pp142-147, 1998) (Heinzmann) in view of Park, K. R., et al., 
"Gaze position detection by computing the three dimensional facial positions and 
motions," Pattern Recognition, Vol. 35, No. 11, Nov. 2002, pp. 2559-2569 (Park). 

As per claim 1 , Heinzmann teaches an image processing apparatus for estimating a 
motion of a predetermined feature point of a 3D object from a motion picture of the 3D 
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object taken by a monocular camera, comprising (Limitations present only within the 
preamble are not given patentable weight): 

observation vector extracting means for extracting projected coordinates 
of tlie predetermined feature point onto an image plane, from each of frames of the 
motion picture (Heinzmann: page 142, col 2, para 2: "forwarded to the 2-D model... 
image plane... 2-D image positions of the features"; Fig. 1); 

3D model initializing means for making the observation vector extracting 
means extract from an initial frame of the motion picture, initial projected coordinates in 
a model coordinate arithmetic expression for calculation of model coordinates of the 
predetermined feature point on the basis of a first parameter, a second parameter, and 
the initial projected coordinates (Heinzmann: Fig. 1; abstract: "3-D model... initialize 
the feature tracking": paramaters: abstract: "feature positions... gaze direction... 
head rotation": Fig. 1: "feature positions... relative positions". Fig. 1 shows that 
the projected coordinates are extracted from the 2-D model into the 3-D model, 
page 142, col 2, para 2: "2-D image positions of the features are transferred to a 3- 
D model of the feature locations"; page 144, col 1, para 5 - col 2, para 1 : "affine 
transformation... a good approximation of perspective projection provided the 
depth of the object does not exceed 1/10 of the distance between camera and 
object. This is usually the case in face tracking applications.": Therefore, a 
parameter is the depth of object which is not expected to exceed 1/10 of the 
distance between the camera and the object; page 144, col 2, paras 2-3: "angle"; 
page 144, col 2, paras 4-5: "theta... orientations"; Fig. 3: "camera coordinates.. 
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angles"; Fig. 2: "angles"; page 145, col 1, para 3 - col 2, para 1: "distance and 
orientation"; page 145, col 2, para 2: "depth". Fig. 4: shows 9 parameters, page 
146, col 1, para 3 - col 2, para 1: "Figure 4 shows the output of some tracking 
parameter including the rotational angles, the displacement, the gaze direction of 
both eyes and the uncertainty of the face tracking"); and 

motion estimating means for calculating estimates of state variables 
including a third parameter in a motion arithmetic expression for calculation of 
coordinates of the predetermined feature point at a time of photography when a 
processed target frame of the motion picture different from the initial frame was taken, 
from the model coordinates, the first parameter, and the second parameter, and for 
outputting an output value about the motion of the predetermined feature point on the 
basis of the second parameter included in the estimates of the state variables 
(Heinzmann: page 142, col 2, para 2: "The estimated positions of the features 
determine the location within the next image frame of the hardware search 
windows." Note that the state variables include the parameters that were listed 
above: page 142, col 2, para 2: "3-D triplets"; Fig. 1: "3-D pose": output), 

wherein the model coordinate arithmetic expression is based on back 
projection of the monocular camera, the first parameter is a parameter independent of a 
local motion of a portion including the predetermined feature point, and the second 
parameter is a parameter dependent on the local motion of the portion including the 
predetermined feature point (Heinzmann: page 142, col 2, para 2: "The 3-D model is 
also projected back into the image plane to adapt the constraints in the 2-D 
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model."; abstract: "monocular"; page 144, col 1, para 5 - col 2, para 1 : "affine 
transformation... a good approximation of perspective projection provided the 
depth of the object does not exceed 1/10 of the distance between camera and 
object. This is usually the case in face tracking applications.": Therefore, a 
parameter is the depth of object which is not expected to exceed 1/10 of the 
distance between the camera and the object. This parameter is independent of 
local motion. Note that parameters are also listed above, page 145, col 2, para 2: 
"monocular"), and 

wherein tine motion estimating means: 
calculates predicted values of the state vahables at the time of photography when the 
processed target frame was tal<en, based on a state transition model (Heinzmann: 
page 143, col 2, para 6: "probabilistic relocation of features based on template 
correlations and a simple 2-D facial model"); 

applies the initial projected coordinates, and the first parameter and the 
second parameter included in the predicted values of the state variables, to the model 
coordinate arithmetic expression to calculate estimates of the model coordinates at the 
time of photography (Heinzmann: Fig. 1 and 3: Note that every frame corresponds 
to a time a photography); 

applies the third parameter in the predicted values of the state variables 
and the estimates of the model coordinates to the motion arithmetic expression to 
calculate estimates of coordinates of the predetermined feature point at the time of 
photography (Heinzmann: page 146, col 1, para 1-2: third parameter can be 
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interpreted to be confidence Figs. 1 and 3. See arguments made above for 
parameters.); 

applies the estimates of the coordinates of the predetermined feature point 
to an observation function based on an observation model of the monocular camera to 
calculate estimates of an observation vector of the predetermined feature point 
(Heinzmann: page 146, col 1, para 1-2. Figs. 1 and 3); 

makes the observation vector extracting means extract the projected 
coordinates of the predetermined feature point from the processed target frame, as the 
observation vector (Heinzmann: page 145, col 1, para 3: "gaze vector"; Figs. 1 and 
3; page 146, col 1, para 2: "gaze vector"); and 

filters the predicted values of the state variables by use of the extracted 
observation vector and the estimates of the observation vector to calculate estimates of 
the state variables at the time of photography (Heinzmann: Fig. 1: "Kalman filtering". 
Note that every frame corresponds to a time of photography. As stated above, the 
state variables include the parameters. A coordinate is an observation vector 
originating from the origin in the corresponding coordinate space; page 145, col 
1, para 3: "gaze vector"; Figs. 1 and 3; page 146, col 1, para 2: "gaze vector... 
Intersecting G, with a world model yields the gaze point"). 

Heinzmann does not teach applies the estimates of the coordinates of the 
predetermined feature point to an observation function using a perspective 
transformation. Park teaches applies the estimates of the coordinates of the 
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predetermined feature point to an observation function using a perspective 
transformation (Park: page 2562 - page2563: particularly page 2563, col 1, para 1: 
"perspective camera model"; page 2568, col 1, para 3: "perspective 
transformation"; Fig. 5). 

Thus, it would liave been obvious for one of ordinary skill in the art at the time the 
invention was made to implement the teachings of Park into Heinzmann since 
Heinzmann suggests a system for determining face and gaze positions using a 
perspective transformation in general and Park suggests the beneficial use of a system 
for determining face and gaze positions using a perspective transformation as to " 
obtain the exact 3D positions of the initial feature points" (Park: page 2568, col 1 , para 
3) in the analogous art of image processing. It would have been obvious for one of 
ordinary skill in the art at the time the invention was made to implement the teachings of 
Park into Heinzmann since Heinzmann suggests the motivation "precisely models the 
actual projection" (Heinzmann: page 144, col 1, para 5). Therefore, both the Heinzmann 
and Park references highlight that the perspective transformation has the benefit of 
precision. Furthermore, one of ordinary skill in the art at the time the invention was 
made could have combined the elements as claimed by known methods and, in 
combination, each component functions the same as it does separately. One of ordinary 
skill in the art at the time the invention was made would have recognized that the results 
of the combination would be predictable. 
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As per claim 2, Heinzmann in view of Parl< teaches tlie image processing 
apparatus according to Claim 1 , wherein the first parameter is a static parameter to 
converge at a specific value, and wherein the second parameter is a dynamic 
parameter to vary with the motion of the portion including the predetermined feature 
point (Heinzmann: See arguments made for rejection claim 1: The static parameter 
can be interpreted to be the length (or depth) of the gaze vector that converges to 
a specific gaze point (page 146, col 1, para 2).The second dynamic value is the 
angle or orientation that varies over time along with the motion). 

As per claim 3, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 2, wherein the static parameter is a depth from the image 
plane to the predetermined feature point (Heinzmann: See arguments made for 
rejection claim 1, 2: The depth of the feature from the image plane is considered 
as a parameter.). 

As per claim 4, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 2, wherein the dynamic parameter is a rotation parameter 
for specifying a rotation motion of the portion including the predetermined feature point 
(Heinzmann: See arguments made for rejection claim 1, 2: The rotation is 
considered as a parameter). 
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As per claim 5, Heinzmann in view of Park teaclies tine image processing 
apparatus according to Claim 4, wherein the rotation parameter is an angle made by a 
vector from an origin to the predetermined feature point, relative to two coordinate axes 
in a coordinate system whose origin is at a center of the portion including the 
predetermined feature point (Heinzmann: See arguments made for rejection claim 1: 
page 146, col 1: "eye orientation... alpha_x, alpha_y... origin is located between 
the eyes"). 

As per claim 6, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 1 , wherein the first parameter is a rigid parameter, and 
wherein the second parameter is a non-rigid parameter (Heinzmann: See arguments 
made for rejection claim 1, 2: The depth is the rigid parameter, and the 
angle/orientation is the non-rigid-parameter. Also, affine and perspective 
transformations are non-rigid transformation, but the depth would not be affected 
by the transformations). 

As per claim 7, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 6, wherein the rigid parameter is a depth from the image 
plane to the model coordinates (Heinzmann: See arguments made for rejection 
claim 1, 6.). 
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As per claim 8, Heinzmann in view of Park teaclies tine image processing 
apparatus according to Claim 6, wherein the non-rigid parameter is a change amount 
about a position change of the predetermined feature point due to the motion of the 
portion including the predetermined feature point (Heinzmann: See arguments made 
for rejection claim 1, 5.)- 

As per claim 9, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 1, wherein the motion model is based on rotation and 
translation motions of the 3D object, and wherein the third parameter is a translation 
parameter for specifying a translation amount of the 3D object and a rotation parameter 
for specifying a rotation amount of the 3D object (Heinzmann: See arguments made 
for rejection claim 1, 2, and 5: Fig. 4: "Disp X... Disp Y": translation; Fig. 1: 
"template tracking" Template tracking or matching accounts for in-plane 
translations. Fig. 3: "camera coordinates... angles"; Fig. 1). 

As per claim 10, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 1, wherein the motion estimating means applies Kalman 
filtering as said filtering (Heinzmann: See arguments made for rejecting claim 1). 

Heinzmann does not teach extended Kalman filtering. 

Park teaches extended Kalman filtering (Park: page 2564, col 1, para 4: "extended 
Kalman"). 
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Thus, it would have been obvious for one of ordinary sl<ill in the art at the time the 
invention was made to implement the teachings of Park into Heinzmann since 
Heinzmann suggests a system for determining face and gaze positions using Kalman 
filtering in general and Park suggests the beneficial use of a system for determining 
face and gaze positions using extended Kalman filtering as to in the analogous art of 
image processing. It would have been obvious for one of ordinary skill in the art at the 
time the invention was made to implement the teachings of Park into Heinzmann since it 
is well known that the extended Kalman filter is applicable to nonlinear problems 
whereas the Kalman filter is not. Therefore, one can apply the extended Kalman filter in 
order to obtain a more robust system. Furthermore, one of ordinary skill in the art at the 
time the invention was made could have combined the elements as claimed by known 
methods and, in combination, each component functions the same as it does 
separately. One of ordinary skill in the art at the time the invention was made would 
have recognized that the results of the combination would be predictable. 

As per claim 23, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 1 , wherein a 3D structure of a center of a pupil on a facial 
picture is defined by a static parameter and a dynamic parameter, and wherein the a 
gaze is determined by estimating the static parameter and the dynamic parameter 
(Heinzmann: See arguments made for rejection claim 1, 2, 5, 9). 
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As per claim 24, Heinzmann in view of Park teaclies tine image processing 
apparatus according to Claim 23, wherein the static parameter is a depth of the pupil in 
a camera coordinate system (Heinzmann: See arguments made for rejection claim 
1,2,5, 9). 

As per claim 25, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 23, wherein the dynamic parameter is a rotation 
parameter of an eyeball (Heinzmann: See arguments made for rejection claim 1, 2, 
5, 9). 

As per claim 26, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 25, wherein the rotation parameter of the eyeball has two 
degrees of freedom to permit rotations with respect to two coordinate axes in an eyeball 
coordinate system (Heinzmann: See arguments made for rejection claim 1, 2, 5, 9: 
alpha_x, alpha_y). 

As per claim 27, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 1 , wherein a 3D structure of the 3D object on the a picture 
is defined by a rigid parameter and a non-rigid parameter and wherein the motion of the 
3D object is determined by estimating the rigid parameter and the non-rigid parameter 
(Heinzmann: See arguments made for rejection claim 1, 2, 5, 6, 9). 
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As per claim 28, Heinzmann in view of Park teaclies tine image processing 
apparatus according to Claim 27, wherein the rigid parameter is a depth of a feature 
point of the 3D object in a model coordinate system (Heinzmann: See arguments 
made for rejection claim 1, 2, 5, 6, 9). 

As per claim 29, Heinzmann in view of Park teaches the image processing 
apparatus according to Claim 27, wherein the non-rigid parameter is a change amount 
of a feature point of the 3D object in a model coordinate system (Heinzmann: See 
arguments made for rejection claim 1, 2, 5, 6, 9). 

Conclusion 

Any Inquiry concerning this communication or earlier communications from the 
examiner should be directed to Atiba Fitzpatrick whose telephone number is (571) 270- 
5255. The examiner can normally be reached on M-F 10:00am-6pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Samir Ahmed can be reached on (571)272-7413. The fax phone number for 
Atiba Fitzpatrick is (571 ) 270-6255. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published 
applications may be obtained from either Private PAIR or Public PAIR. Status 
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information for unpublislied applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business 
Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 
Customer Service Representative or access to the automated information system, call 
800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

Atiba Fitzpatrick 
/A. O. F./ 

Examiner, Art Unit 2624 
/Samir A. Ahmed/ 

Supervisory Patent Examiner, Art Unit 2624 



